Model selection for mixtures of mutagenetic trees.

نویسندگان

  • Junming Yin
  • Niko Beerenwinkel
  • Jörg Rahnenführer
  • Thomas Lengauer
چکیده

The evolution of drug resistance in HIV is characterized by the accumulation of resistance-associated mutations in the HIV genome. Mutagenetic trees, a family of restricted Bayesian tree models, have been applied to infer the order and rate of occurrence of these mutations. Understanding and predicting this evolutionary process is an important prerequisite for the rational design of antiretroviral therapies. In practice, mixtures models of K mutagenetic trees provide more flexibility and are often more appropriate for modelling observed mutational patterns. Here, we investigate the model selection problem for K-mutagenetic trees mixture models. We evaluate several classical model selection criteria including cross-validation, the Bayesian Information Criterion (BIC), and the Akaike Information Criterion. We also use the empirical Bayes method by constructing a prior probability distribution for the parameters of a mutagenetic trees mixture model and deriving the posterior probability of the model. In addition to the model dimension, we consider the redundancy of a mixture model, which is measured by comparing the topologies of trees within a mixture model. Based on the redundancy, we propose a new model selection criterion, which is a modification of the BIC. Experimental results on simulated and on real HIV data show that the classical criteria tend to select models with far too many tree components. Only cross-validation and the modified BIC recover the correct number of trees and the tree topologies most of the time. At the same optimal performance, the runtime of the new BIC modification is about one order of magnitude lower. Thus, this model selection criterion can also be used for large data sets for which cross-validation becomes computationally infeasible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mtreemix: a software package for learning and using mixture models of mutagenetic trees

SUMMARY Mixture models of mutagenetic trees constitute a class of probabilistic models for describing evolutionary processes that are characterized by the accumulation of permanent genetic changes. They have been applied to model the accumulation of chromosomal gains and losses in tumor development and the development of drug resistance-associated mutations in the HIV genome.Mtreemix is a softw...

متن کامل

Rtreemix: a package for estimating mutagenetic trees mixture models and genetic progression scores

The mixture of mutagenetic trees introduced in [1] is an evolutionary model that provides an interpretable probabilistic framework for modeling multiple paths of ordered accumulation of permanent genetic changes that can be used for describing many disease processes. Each path captures a possible route of disease development. These models are used to model HIV progression characterized by accum...

متن کامل

Estimating Evolutionary Pathways and Genetic Progression Scores with Rtreemix

In genetics, many evolutionary pathways can be modeled on the molecular level by the ordered accumulation of permanent changes. We have developed the class of mixture models of mutagenetic trees (Beerenwinkel et al., 2005a) that provides a suitable statistical framework for describing these processes. These models have been successfully applied to describe disease progression in cancer and in H...

متن کامل

Dimensions of Group-based Phylogenetic Mixtures

In this paper we study group-based Markov models of evolution and their mixtures. In the algebreo-geometric setting, group-based phylogenetic tree models correspond to toric varieties, while their mixtures correspond to secant and join varieties. Determining properties of these secant and join varieties can aid both in model selection and establishing parameter identifiability. Here we explore ...

متن کامل

Statement of Research Interests

My general research interests lie in the statistical inference of stochastic processes superimposed on stochastically evolving networks. Stochastic processes are broadly defined to include Poisson processes, Markov chains, and non-linear stochastic differential equations. Stochastically evolving networks are meant to provide the biologically realistic contexts within which our stochastic proces...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Statistical applications in genetics and molecular biology

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2006